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Abstract 

In a recent article [l[ , the FEAST algorithm has been presented as a general 
purpose eigenvalue solver which is ideally suited for addressing the numerical 
challenges in electronic structure calculations. Here, FEAST is presented be- 
yond the "black-box" solver as a fundamental modeling framework which can 
naturally address the original numerical complexity of the electronic struc- 
ture problem as formulated by Slater in 1937 0. The non-linear eigenvalue 
problem arising from the muffin-tin decomposition of the real-space domain 
is first derived and then reformulated to be solved exactly within the FEAST 
framework. This new framework is presented as a fundamental and practical 
solution for performing both accurate and scalable electronic structure cal- 
culations, bypassing the various issues of using traditional approaches such 
as linearization and pseudopotential techniques. A finite element implemen- 
tation of this FEAST framework along with simulation results for various 
molecular systems are also presented and discussed. 
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1. Introduction 



Since the 1930's, progress in electronic structure calculations has al- 
ways been tied together with advances in numerical strategies for address- 
ing the eigenvalue Schrodinger equation. In particular, several attempts 
have been undertaken to reduce the complexity of the eigenvalue problem 
in self-consistent calculations by dissociating, screening or removing the ef- 



fect of the core electrons. Most used techniques include 3, |4J: muffin-tin 
approximations along with augmented plane wave (APW) 2] and linearized 
APW, muffin-tin orbitals (MTO) and linearized MTO, KKR methods @L 
augmented spherical wave methods [gj], pseudopotential approaches @,ls| 
and the projector augmented-wave method [§]. The conceptual approach of 
the former consists in partitioning the real space into spheres around each 
atom, allowing different discretization and solving strategies to take place in 
separate regions in space. Therefore, the atom-centered regions can benefit 
from specific discretization schemes (i.e. basis sets) that are both suitable to 
capture the highly localized core states around the nuclei and considerably 
reduce the effective size of the resulting eigenvalue problem in the intersti- 
tial region. This approach can be cast as a domain decomposition method 
which, in modern days, is most suitable for parallel computing since calcu- 
lations on all these sub-domains can also be performed independently. Once 
the eigenvalue problem is reformulated using domain decomposition strate- 
gies, however, the resulting (and still exact) problem now takes the form 
of a non-linear one in the interstitial region (i.e. H(E)ip = Eif)) since the 
boundary conditions at the interface with the atom-centered regions are en- 
ergy dependent. The major difficulty of solving this non-linear eigenvalue 
problem has been largely avoided by mainstream approaches to electronic 
structure calculations that rely mostly on approximations ranging from di- 
rect linearization techniques (e.g. LAPW, LMTO, etc.) [lo|, 0, E3, EE G 



to pseudopotential techniques [8|, [15( that eliminate the core states. 

This paper presents a fundamental strategy for performing all-electron 
(i.e. full-potential) electronic structure calculations which relies entirely on 
the capabilities of the new FEAST algorithm framework for solving the eigen- 
value problem [lj]. At first, the algorithm can operate in parallel to obtain 
core and valence electrons independently spanning different energy ranges. 
Secondly, solving the original eigenvalue problem within a given energy range 
(i.e. search interval) is mainly reformulated into solving a set of well-defined 
independent linear systems along a complex energy contour. As a result, 



2 



the muffin-tin domain decomposition used to partition the real-space can act 
directly on the linear systems; therefore, the eigenvalue problem does not 
need to be explicitly formulated into a non-linear one. In comparison to lin- 
earization techniques, the resulting linear systems also need to be evaluated 
for a certain set of pivot energies, but those are now explicitly provided by 
FEAST to guarantee the convergence of the solutions in the entire system. 
Additionally, the complexity of interstitial problem scales linearly with the 
number of atoms, and it can be shown that the proposed highly accurate all- 
electron muffin-tin framework is also potentially capable of better scalability 
performances than pseudopotential approaches on parallel architectures. 

The outline of this paper is as follows: In section [2] we briefly summa- 
rize the numerical steps and properties of the FEAST algorithm presented in 
[l[ for solving the traditional eigenvalue problem. Section [3] presents a gen- 
eral definition of the muffin-tin strategy and the derivation of the resulting 
non- linear eigenvalue problem. Section H] describes how FEAST can be effec- 
tively and generally used to solve the muffin-tin problem without resorting 
to linearization or other approximations. The capabilities of the new all- 
electron framework as compared to other approaches are then discussed in 
Section [5] and illustrated using finite element (FEM) simulations for solving 
the DFT/Kohn-Sham problem on various molecular systems. 

2. The FEAST Algorithm 

In electronic structure calculations, one considers solving the Schrodinger- 
type equation in an entire domain Q which can be finite, periodic, or Bloch 
periodic: 



where {Ei,^>i} are the resulting eigenpairs (also parametrized by k in the 
case of bandstructure calculation using a Bloch periodic system). Thereafter, 
any discretization schemes in Q would result in the generalized and linear 
eigenvalue problem of size N: 



where S is a positive definite matrix (mass matrix) obtained using non- 
orthogonal basis functions (S = I otherwise), and \I/ contains the N unknown 
components of the wave function (e.g. basis set coefficients, nodal values, 
etc.). 




(1) 



(2) 
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FEAST is both a new numerical algorithm [l[ and a new general pur- 



pose high-performance numerical library [16[ for solving the standard or 
generalized eigenvalue problem of type (j2J), and obtaining all the eigenvalues 
and eigenvectors within a given search interval [E min , E max ]. FEAST is a 
density-matrix-based algorithm which differs fundamentally from traditional 
approaches for solving the eigenvalue problem (such as Lanczos, Arnoldi, 
Jacobi-Davidson, etc.) by combining properties of numerical linear algebra 
and complex analysis. Its main computational tasks consist of solving very 
few independent linear systems with multiple right-hand sides along a com- 
plex contour and one reduced dense eigenvalue problem orders of magnitude 
smaller than the original one (the size of this reduced problem is of the order 
of the number of eigenpairs inside the search interval). The basic FEAST 
algorithm for symmetric eigenvalue problem detailed in [l[ is briefly summa- 
rized in the following. 

Starting from a set of M linearly independent random vectors Y NxMq = 
{yii Y2, -yMoli where M is chosen greater than the number of the eigenval- 
ues M in the search interval (i.e. M represents then an over-estimation of M 
which is not known a priori), a new set of vectors Q NxMq = {<ii,<l2, -qMo} 
is obtained as follows: 

^M Q = -7^J c dZG(Z)Y NxMo , (3) 

where C represents a complex contour from E min to E max , and the system 
Green's function G at the complex energy Z is defined by G(Z) = (ZS — 

n)-\ 

In practice, the vectors Q in (EJ) can be computed using a high-order 
numerical integration where only very few linear systems G(Z)Y need to be 
solved along the complex contour C i.e. 

(ZS - H)Q( Z ) = Y, (4) 

where Q'- 2 -' denotes the set of responses at a given pivot energy Z for a 
given set of excitations Y in Q. Since G(Z) = G'(Z), (where > stands for 
transpose conjugate), it can be shown that the numerical integration in (|2D 
can be performed on the positive half circle of the complex contour C + (see 
Figure [T]) . 

Using a Rayleigh-Ritz procedure, and by computing 

H Q = Q f HQ and S Q = Q f SQ, (5) 
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Figure 1: Representation of the complex contour integral defined by the positive half circle 
C + for a given search interval [E m i n ,E max \. In practice, the vectors Q are computed via a 
numerical integration (e.g. Gauss-Legendre quadrature) where only very few linear systems 
G{Z)Y Op needs to be solved at specific points Z e along the contour. 

a projected reduced dense eigenvalue problem of size M can be formed: 

H Q $ = eS Q $. (6) 

This reduced problem can be solved using standard eigenvalue routines for 
dense systems to obtain all the eigenpairs {e m , $ m }. By setting E m = e m and 
computing & NxMo = Q NxMq $ Mo xm > ^ follows that if E m lies inside the con- 
tour, it is an eigenvalue solution and its eigenvector is ^$> m (the m th column of 
The eigenvectors are also naturally S-orthonormal, if the eigenvectors 
of the reduced problem are SQ-orthonormal. Accuracy can be systematically 
improved using a new set of initial guess vectors Y = SSI/ iteratively up until 
convergence. 

The FEAST algorithm offers many important and unique capabilities for 
achieving accuracy, robustness, high-performance and scalability on paral- 
lel computing architectures. The algorithm holds indeed all the following 
intrinsic properties: 

• Using a high-order numerical integration scheme such as Gauss-Legendre 
quadrature, 8 to 16 contour points suffice for FEAST to consistently 
converge in ~2 to 3 iterations to obtain up to thousands of eigenpairs 
with very high accuracy. 

• FEAST benefits from an exact mathematical derivation and naturally 
captures all multiplicities. 



ax 
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No (explicit) orthogonalization procedure is required. 
FEAST has the ability to re-use the basis of a pre-computed subspace 
for fast convergence or as suitable initial guess for solving a series of 
eigenvalue problems that are close one another (e.g. for bandstructure 
calculations [lj, time-dependent propagation 17|, etc.). 



FEAST allows the use of iterative methods for solving the inner linear 
system, and which are ideally suited for large-sparse problems. 



Finally, efficient parallel implementations for FEAST can be addressed 
at three different levels [l6|, 18 1: (i) many search intervals can be run in- 
dependently (no overlap), (ii) each linear system in ([3]) can be solved inde- 
pendently along the complex contour C, (e.g. simultaneously on different 
compute nodes), (iii) the linear system fll]) can be solved in parallel (the 
multiple right sides can be parallelized as well). Consequently, one can show 
that if enough parallel computing power is available, the main computational 
cost of FEAST for solving the eigenvalue problem can be ultimately reduced 
to solving only one linear system fll]). This problem can be in principle ad- 
dressed by taking advantage of the many advances in "black-box" direct or 
iterative parallel system solvers. However, a domain decomposition strategy 
such as muffin-tin described in the next section, is naturally more suited 
to address specifically the electronic structure problem within a multi-atom 
centered environment. 



3. From Linear to non-linear eigenvalue problem using the muffin- 
tin strategy 

A muffin-tin strategy for the electronic structure problem has been pro- 
posed as early as the 1930's, and in particular, it has been used as a starting 
point for the APW method introduced by Slater A muffin-tin decompo- 
sition brings flexibility in the discretization step, reduces the main computa- 
tional efforts within the interstitial region alone, and should also guarantee 
maximum linear scalability performances using modern parallel computing 
architectures. Without any loss of generality, Figure |2] illustrates the essence 
of the muffin-tin domain decomposition (here using the particular example of 
a real-space mesh discretization for the Benzene molecule). For the particular 
case of APW, the atom-centered regions use atomic orbitals basis functions, 
while a plane wave expansion scheme is used for the interstitial region. The 
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derivation that follows, however, is independent of the type of basis function 
used within the muffin-tin decomposition. 





Figure 2: Using a muffin-tin domain- decomposition method, the whole simulation domain 
51 is separated into multiple atom-centered regions £lj (j > 0) and one large interstitial 
region Q,q. Different basis-sets can be used independently to describe the different regions. 
The figure on the left represents a 2D view of local finite element discretization using 
a coarse mesh for Qq (represented only partially here) connecting all of the atoms of a 
Benzene molecule. In contrast, the figures on the right represent a finer mesh for the £lj 
regions suitable to capture the highly localized core states around the nuclei. 



Formally, the solutions {Ei,^i} that satisfy the continuum model ([I]), 
can also be obtained from a Schrodinger equation in the interstitial region 
Qo alone, provided that appropriate boundary conditions are imposed at the 
interfaces Tj with the atom-centered region Qj i.e. 



M(x) = £*(x), x G fin 



(7) 



where H Q is the Hamiltonian in Q . A general mathematical form for these 
boundary conditions on Tj supplies a relation between the normal derivative 
of the solution and their boundary values (Vj): 



1 9*(x) 

2 dry 



/ dx Ej(E,x,x) *(x'), 



x g r 



(8) 
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where H = m = 1, r\j represents the external normal at Tj, and Ej is a non- 
local and energy dependent operator (i.e. self-energy) which can be derived 
from the the atom-centered Green's function Gj in Qj. This later is given by 

(E - H^GjiE,*,*) = <5(x-x'), x,x efij-, (9) 

where ifj is the Hamiltonian in Qj, and it is important to note that Gj can 
be constructed with arbitrary boundary conditions at Tj. For instance, by 
choosing the Green's function Gj to have zero derivative on Tj (i.e. homo- 
geneous Neumann boundary conditions), one can obtain from the Green's 
identity a simple expression for Ej (inverse of the surface Green's function): 

Ej(E, x, x') = Gj\E, x, x'), x, x G Tj. (10) 



This derivation was originally introduced in [19[ as an embedding potential 
technique for the Schrodinger equation. Alternatively, another simple expres- 



sion for Ej has been derived in [20j using homogeneous Dirichlet boundary 
conditions for Gj on Tj. After discretization of ([7]) using the condition (JBD 
(and usually performed on the variational form of the problem), the resulting 
non-linear eigenvalue problem in Qq takes the general form 

(H -U£j(£))*o = £So*o, (11) 



where So is the mass matrix in Q , contains the unknown components 
of the solution in Q Q , and is the self-energy matrix obtained from 

the discretization of fflOl) in Qj coupling all the unknowns on Tj (i.e. non- 
local term on Tj) and formally added here (for clarity) to the interstitial 
Hamiltonian. 

Alternatively to a continuum treatment of the problem ([TJ), one could 
directly replace the unknown components of Sfj belonging to the interior 
subdomains Qj from the system matrix §2§ by the following self-energy 21 



E j (£;)=r j G j 7j T , (12) 

where Tj describes the interaction between Q and the atom-centered region 
Qj. In linear algebra, this non-overlapping domain decomposition procedure 
gives rise to a reduced non-linear system similar to fill I) which is known as 



the Schur complement [22 
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As originally noted by Slater [2] for the particular case of APW, this 
non-linear eigenvalue problem fill I) gives rise to an energy dependent secular 
equation which cannot be handled by traditional linear eigenvalue algorithms. 
Difficulties would include in particular: absence of orthogonality for i n 
Q , and a non-linear reduced system (Q if a Rayleigh-Ritz procedure is used. 
Although solving explicitly the non-linear eigenvalue problem f lTTj) is not 
impossible 23|, |24J, it remains practically very challenging. 

In practice, and as mentioned previously, the muffin-tin decomposition 
has always been associated with a new level of approximations for solving 
the resulting non-linear eigenvalue problem in the interstitial region. The 
mainstream approaches to all-electron electronic structure calculations rely 
indeed almost entirely on approximations such as direct linearization tech- 
niques (e.g. LAPW, LMTO, etc.) [k1 EH, H, El • Alternatively, linear eigen- 



value problems can also be obtained from pseudopotential techniques j&TloT] 
using smooth but non-local potentials in atom-centered regions that elimi- 
nate the core states while introducing the notion of pseudo-wavefunctions. 

The next section describes how FEAST can be effectively (and generally) 
used to bypass these issues. 



4. Implicit treatment of the non-linear problem using FEAST 

Here, we propose to address implicitly the non-linear eigenvalue problem 
(11 ip by noting that the linear system (j3J) arising from FEAST applied in the 
entire domain Q, can be directly solved using the same muffin-tin domain 
decomposition. 

At first, starting from a set of excitations Y(x) in the continuum domain, 
the set of responses can also be obtained by solving the Schrodinger 
equation ([7]) in Q alone: 

(Z-tf )Q^(x) = r(x), xefio, (13) 

where the boundary condition for on Tj should formally satisfy ([8]) but 
augmented by a source term Fj (x) (to add to the right hand side) which 
accounts for the effects of the excitations Y(x) within the atom-centered 
regions Qj. For instance, using Neumann boundary conditions for Gj, the 
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self-energy Ej has been defined in (fTUl) and one can show that (Vj): 
Ff ) (x)= / dx! GjXZ,x,x!)x 

[ dx" Gjiz^y) y(x")l, xer r (14) 

Once is known in f2 an d hence on all the 1^- interfaces, the solutions 
in the flj domains can be independently retrieved Vj by solving the linear 

systems 

(Z-H j )QW(x)=Y(x), xe% (15) 

with Dirichlet boundary conditions. 

After discretization of ffTB"]) along with the boundary conditions ([8]), ffTOj) . 
and (THj) . solving flU) in the entire domain Q can then be replaced exactly by 
the following three stage procedure: 

1. For all j atoms: 

• obtain the self-energy Ej(Z) by computing only selected elements 
of the atom-centered Green's function matrix Gj: 

G j (Z) = (ZS j -H j )" 1 , (16) 

• obtain the discretized form of the local source term Fj given by: 

F{ ss) = S J (Z)Gj(Z)Yj l (17) 

2. solve the following linear system for the unknown components of the 
solutions Qq in fi 

(ZS - Ho + USj(^))Q Z) = Y + |jFf \ (18) 

3 J 

where self-energy and source term matrices in the atom-centered re- 
gions j have been formally added (for clarity) to the interstitial system 
matrix. 

3. For all j atoms, solve the sub-problem ffl~5|) to obtain the unknown 

(z) 

components of the solutions Qj in the atom-centered regions Qj. 
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Thereafter, the subspace Q in ([3]) is obtained by integration of the set of 

CZ) (Z) 

solutions Qq and all the Q. over the complex contour C. In practice, it 
is possible to construct H q and S q in (J5J) directly from the projection of H 
and So for Qq and Hj and Sj for all Qj. 

As a result of (fl8l) which is solved only for specific complex pivot energies 
Z, the non-linearity of the Schrodinger equation ( 1111) in Q is then explicitly 
removed, and the muffin-tin problem benefits now from an exact numeri- 
cal treatment (i.e. no approximations needed). In contrast to linearization 
techniques, these pivot energy located in the complex plane are explicitly 
provided by FEAST (e.g. Gauss contour point in Figure [I]) to guarantee 
global convergence of the solutions of the Schrodinger equation in the whole 
simulation domain Q. 

In practice, it is important to mention that the additional computational 
costs by pivot energy Z for obtaining F. (in step 1 above) and retrieving 

the solution Qj in Qj (in step 3 above), can be made minimal. At first, 
the muffin-tin decomposition naturally allows each atom-centered region to 
be factorized and solves independently. As a result, the computations for ob- 
taining the self-energy the source term Fj, and retrieving the solution Qj, 
can be fully parallelized Vj. Additionally, most of the efforts that have been 
devoted for obtaining Sj(Z) do not need to be repeated (e.g. factorization 
of the matrix (ZSj — Hj), computations of some key elements of Gj). 



5. Application: an all-electron real-space mesh implementation 

In order to illustrate the efficiency and capability of the proposed FEAST 
muffin-tin framework, we propose to comment on our first-principle DFT/LDA 
all-electron simulations obtained on various molecular molecular systems us- 
ing a finite element discretization. 

5.1. Definition of the muffin-tin finite- element mesh 

As illustrated in Figure [2j the 3D finite-element muffin-tin mesh can be 
built in two steps: (i) a 3D atom-centered mesh which is highly refined around 
the nucleus, and (ii) a much coarser 3D interstitial mesh that connects all 
the atom-centered holes (both meshes have been generated using the TetGen 
software (25|). For the atom-centered mesh, which has been chosen com- 
mon to all atoms, we have used successive layers of polyhedra similar to the 



ones proposed in [26|. This discretization provides both tetrahedra of good 



quality, an arbitrary level of refinement (i.e. the distance between layers can 
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be arbitrarily refined while approaching the nucleus), and the same number 
of surface nodes. Indeed, the outer layer is consistently providing the same 
(relatively small) number of connectivity nodes rij with the interstitial mesh 
at Tj (i.e. rij = 26, 98, or 218 nodes respectively using linear, quadratic or 
cubic FEM). Consequently, the size of linear system (j!8p in the interstitial re- 
gion Q Q would stay independent of the atom-centered regions system matrix 
size, and the approach can then ideally deal with full potential (all-electron). 
Additionally and as shown in Figure [3] for different molecular configurations, 
the interstitial region scales linearly with the number of atoms while the 
size of the atom-centered mesh remains constant. In contrast to the (re- 
constructed) full mesh, the interstitial mesh exhibits a dramatic decrease in 
number of mesh points and a more advantageous linear scaling rate. 

5.2. Practical considerations 

Atom-centered, interstitial, or full finite element discretization, provide 
highly sparse system matrices. As mentioned in Section [21 FEAST can be di- 
rectly applied for solving the eigenvalue problem obtained from the full mesh 
discretization. In particular, this can be accomplished by using the prede- 



fined sparse interfaces from the FEAST software package [16j (which is linked 
to the direct system solver PARDISO [27] by default). The software package 
also features a reverse communication interface that enables a straightfor- 
ward substitution of the inner linear system factorization and solve stages 
along the complex contour by the muffin-tin three-step procedure described 
in Section HI In particular, once the linear systems f|T3|) and f|T5|) formed, 
they can be factorized and solved using the direct system solver PARDISO 
(for example). We also note that only rij columns of Gj (1161) associated to 
the nodes at the boundary Tj are needed to compute both the self-energy 
Sj(Z) and the source term F- in ( IT7|) and this independently on the total 
number of nodes inside Qj. Indeed, after discretization of ()9]) and ( TTUj) . and 
assuming a particular ordering of the matrix elements (for clarity), it results 

(Vj): 

£j(£) = ([I n . ... 0] (JSSj - Hj)- 1 [I n . Of)" 1 , (19) 

where the matrix Sj of size nj contains all the non-zero elements of Sj. The 
operation costs for (fT9l) include solving a linear system with nj right hand 
sides and the inversion of a very small matrix of size nj. 

Additionally, one can show using a proper reordering of the matrix ele- 
ments, that the same matrix factorization applied on (ZSj — Hj) to compute 
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Sj(Z), can also be effectively used to compute F. , as well as the solution 
Q] (fl5|) using a single solve stage [28 . 

5.3. Results and Discussions on Scalability 

At first, the results for muffin-tin FEAST framework applied to the Hy- 



drogen atom, as well as the H2 and + molecules 29_|, are compared to 
analytic solutions for verification purposes. For all examples, a cubic box 
with edges of 16A was used for the interstitial region, and the radius of 
the atom-centered region has been set to 0.35A (the proposed framework 
is however applicable independently of the choice of this atom-centered ra- 
dius). The different mesh sizes using P3 FEM have been presented in Figure 
El while a representation of the muffin-tin mesh for and molecules 
is given in Figure HJ 

The full-mesh and muffin-tin approaches are fundamentally identical, 
therefore the obtained numerical results are the same (within machine ac- 
curacy). In comparison to analytical results, a very good agreement for the 
fundamental energy is obtained with 0.25%, 0.08% and 0.05% respectively 
for H, H% and H^ + . Since the system is here bounded by a fixed size sim- 
ulation box, by increasing the number of atoms (or atomic weight), one also 
increases the locality of the solutions in the molecular region and the overall 
accuracy. In a more general situation, accuracy can be systematically in- 
creased by refining the atom-centered mesh along with the interstitial mesh 
in the molecular region (i.e. increasing the number of nodes or the order of 
the FEM). 

For the case of the Benzene molecule (see Figure |2]), all-electron simu- 
lations have been performed self-consistently using both a GR-pulay proce- 
dure [3oj], and an alternative approach deriving from a generalization of the 
FEAST algorithm for solving directly the full coupled non-linear DFT/Kohn- 



Sham problem [31] . While a full three-dimensional discretization using cubic 
finite element gives rise to sparse system matrices of size of N = 133,579 in 
Q (see Figure [3]), the muffin-tin decomposition provides, in turn, a series of 
subsystems of size Nj = 9457 in the atom-centered regions Qj with rij = 218 
on Fj, and a single system of size N = 22, 711 in the interstitial region Q . 
Our total energy found at self-consistent convergence .Elda = —6261.14 eV, 
is also in very good agreement with the accuracy of the all-electron simulation 
results presented in [26| . 

The muffin-tin decomposition naturally allows each atom-centered region 
to be factorized and solved independently; as a result, the computation for 
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obtaining the self-energy Xj, the source term Fj, and retrieving the solution 
Qj, can be fully parallelized. Since the scalability of FEAST on parallel 
architecture is first and foremost dependent on the scalability of the inner 
linear system solver (i.e. third level of parallelism for FEAST), the main 
difficulty would come from solving the interstitial system ffl8|) in parallel. As 
shown in Figure [31 the size of the interstitial system does increase linearly 
but at a much lower rate than the full system (i.e. N << N for large 
number of atoms). For example, while iV = 133,579 is used for the full 
twelve atoms system of the Benzene molecule, the same size order would 
only be reached by the interstitial system (iVo) using 85 atoms. Similarly, 
500 atoms would only generate an interstitial system of moderate size iV = 
750,000, while iV ~ 5 x 10 6 would be required for the full system (using 
Nj ~ 10000). It is important to note that the muffin-tin approach enables 
the use of hybrid basis-set, and a quadratic rather a cubic FEM discretization 
could then potentially be used for the interstitial region that exhibits low- 
varying potential (and N would then decrease significantly). Finally, the size 
rij of the non-local blocks in Ej (fT9~|) being relatively very small, the linear 
scalability of the linear system ( JT8"j) is then not affected by the presence of the 
self-energy matrices. Consequently, in practice the interstitial system can be 
efficiently solved using "black-box" parallel sparse system solver that can be 
either direct or iterative (using an appropriate preconditioner). 

5.4- FEAST all-electron calculations vs pseudopotential 

The FEAST framework is independent of any particular form of the po- 
tential in the atom-centered regions (such as spherically symmetric potential, 
etc.). Since the size of linear system (TIB"]) in the interstitial region Qq is inde- 
pendent of the discretization schemes for the atom-centered regions Qj, the 
approach can ideally deal with full potential within self-consistent calcula- 
tions (i.e. all-electron calculations). 

The development of techniques such as pseudopotential were originally 
motivated to ease several numerical difficulties that one can encounter with 
all-electron calculations in the atom-centered regions @, 0, 15]. Let us then 



outline how some of these main issues are naturally addressed within the 
FEAST all-electron framework: 

(i) Since FEAST can act independently on different energy ranges (with 
no overlap), the number of states in a search interval can be narrowed as 
desired, and the frozen-core approximation does not need to be considered 
within self-consistent iterations. 



14 



(ii) In contrast to pseudopotential, it is clear that a much finer level of 
discretization for the FEAST all-electron framework is needed to capture 
the (true) wave functions in the atom-centered regions flj (i.e. the pseudo- 
wavefunctions can be captured within a reduced basis set). The linear eigen- 
value system obtained using pseudopotential can either be seen as a much 
smaller size system as compared to (j2J), or a linearized version of (|lip where 
Sj represents then a fully non-local pseudopotential over the atom-centered 
region Qj. Although this pseudopotential system can also ideally be solved 
using the FEAST algorithm, the resulting system matrix (j4]) would end up 
(paradoxically) being larger and less scalable than ([18]) (which only requires 
the surface terms Tj of the Qj regions). 

6. Conclusion 

In 1937, Slater originally derived a non-linear electronic structure problem 
by introducing the APW method using a muffin-tin domain decomposition; 
he then stated |2|, "Of course, we cannot solve this exactly, and we must 
look for methods of approximations" . These limitations have historically mo- 
tivated the development of a wide spectrum of approximation techniques 
ranging from direct linearization to pseudopotential methods. Within the 
framework of the FEAST algorithm, however, the muffin-tin problem bene- 
fits now from an exact numerical treatment (i.e. no approximation needed) 
which consists of removing the non-linearity of the Schrodinger equation ( II ip 
in the interstitial region fl , by considering only certain complex pivot ener- 
gies Z ffT8]) . In contrast to linear approximations (including LAPW, LMTO, 
linearized embedding method f!9j, etc.), these pivot energies are explicitly 
provided by FEAST with guaranteed convergence of the solutions of the 
Schrodinger equation in the whole simulation domain. As compared to ab- 
initio pseudopotential approaches, in fact, not only the proposed FEAST 
framework is more accurate by essence (since all-electron calculations are 
performed), it is also capable of higher parallel scalability performances. In 
conclusion, the new approach is not tied together with the traditional model- 
ing trades-off between robustness/accuracy and performances/scalability in 
simulations. 

Finally, the FEAST fundamental framework for first-principle electronic 
structure calculations can be used independently of the choice for the physical 
model (e.g. Density- Functional Theory or Hartree-Fock), the nature of the 
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atomistic system (e.g. isolated or Bloch periodic), or the choice for the basis 
set (e.g. PW, atomic orbitals, real-space mesh). 
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Figure 3: Scalability of the number of nodes N, Nj, and Nq, respectively for the full, atom- 
centered, and interstitial meshes, with the number of atoms per molecule. The full mesh has 
been reconstructed from the muffin-tin atom-centered and interstitial meshes obtained using 
a cubic tetrahedra FEM discretization. We note here that the number of surface points at 



the boundary Fj of the atoms is here nj 
with N at the number of atoms. 



Nj = 9457 and N = N + (Nj - nj) * N at 



Figure 4: 2D representation of the 3D finite element discretization used to simulate the 
molecule (on the left), and the + molecule (on the right). Only the molecular region is 
represented since the coarse mesh extend to SA (from the middle) in each direction. All 
atom-centered region meshes are identical, and their radius has been set to 0.35^4. 
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